= GPSD-NG: A Case Study in Application Protocol Evolution =
:description: A case study in the evolution of the gpsd protocol
:keywords: GPSD, protocol, evolution
Eric S. Raymond <esr@thyrsus.com>
v1.5.2, January 2016

This document is mastered in asciidoc format.  If you are reading it in HTML,
you can find the original at the GPSD project website.


== Introduction ==

GPSD is a service daemon that collects data from serial and USB GPS
sensors attached to a host machine and presents it in a
simple-to-parse form on TCP/IP port 2947.  This is a less trivial task
than it sounds, because GPS sensor interfaces are both highly variable
and really badly designed (see http://esr.ibiblio.org/?p=801[Why GPSes
suck, and what to do about it] for a description of NMEA 0183 and
other horrors).

In this paper, however, we will be ignoring all the dodgy stuff that
goes on at GPSD's back end to concentrate on what happens at the front
- the request-response protocol through which client programs get
access to the information that GPSD acquires from its devices and
internal computations.

The GPSD request-response protocol is entering its third generation of
design, and I think the way it has evolved spotlights some interesting
design issues and long-term trends in the design of network protocols
in general.  To anticipate, these trends are: (1) changing tradeoffs
of bandwidth economy versus extensibility and explicitness, (2) a
shift from lockstep conversational interfaces to event streams, (3)
changes in the "sweet spot" of protocol designs due to increasing use
of scripting languages, and (4) protocols built on metaprotocols.

Carrying these trends forward may even give us a bit of a glimpse at
the future of application-protocol design.

== The first version: a simple conversational protocol ==

The very first version of GPSD, back in the mid-1990s, handled
NMEA GPSes only and was designed with a dead-simple request-response
protocol.  To get latitude and longitude out of it, you'd connect
to port 2947 and have a conversation that looked like this:

-------------------------------------------------------------------------
-> P
<- GPSD,P=4002.1207 07531.2540
-------------------------------------------------------------------------

That is GPSD reporting, the only way it could in the earliest protocol
version, that I'm at latitude about 40 north and 75 west.

If you are a mathematician or a physicist, you're probably noticing
some things missing in this report.  Like a timestamp, and a circular
error estimate, and an altitude.  In fact, it was possible to get some
these data using the old protocol. You could make a compound request
like this:

-------------------------------------------------------------------------
-> PAD
<- GPSD,P=4002.1207 07531.2540,A=351.27,D=2009:07:11T11:16Z
-------------------------------------------------------------------------

For some devices (not all) you could add E and get error estimates.
Other data such as course and rate of climb/sink might be available
via other single-letter commands. I say "might be" because in those
early days gpsd didn't attempt to compute error estimates or velocities
if the GPS didn't explicitly supply them.  I fixed that, later, but
this essay is about protocol design so I'm going to ignore all the
issues associated with the implementation for the rest of the discussion.

The version 1 protocol is squarely in the tradition of classic textual
Internet protocols, even though it doesn't look much like (say) SMTP
transactions - requests are simple to emit and responses are easy to
parse. It was clearly designed with the more specific goal of
minimizing traffic volume between the daemon and its clients. It
accomplishes that goal quite well.

== The second version: from conversational to streaming ==

However, when I started work on it it in 2004 there was already
pressure from the existing userbase to change at least one of the
protocol's major assumptions - that is, that the client would poll
whenever it wanted data. It's usually more convenient to be able to
say to the daemon "Speak!" and have it stream TPV
(time/position/velocity) reports back at you at the sensor's sampling
rate (usually once per second).  Especially when, as with GPSD, you
have a client library that can spin in a thread picking up the updates
and dropping them in a struct somewhere that you specify.

This was the first major feature I implemented.  I called it "watcher
mode", and it required me to add two commands to the protocol. There
were already so many single-shot commands defined that we were close
to running out of letters for new ones; I was able to grab "W" for the
command that enables or disables watcher mode, but was left with the
not-exactly-intuitive "O" for the streaming TPV report format.  Here's
how it looks:

-------------------------------------------------------------------------
-> W=1
<- GPSD,W=1
<- GPSD,O=MID2 1118327700.280 0.005 46.498339529 7.567392712 1342.392 36.000 32.321 10.3787 0.091 -0.085 ? 38.66 ? 3
<- GPSD,O=MID2 1118327701.280 0.005 46.498339529 7.567392712 1342.392 48.000 32.321 10.3787 0.091 -0.085 ? 50.67 ? 3
<- GPSD,O=MID2 1118327702.280 0.005 46.498345996 7.567394427 1341.710 36.000 32.321 10.3787 0.091 -0.085 ? 38.64 ? 3
<- GPSD,O=MID2 1118327703.280 0.005 46.498346855 7.567381517 1341.619 48.000 32.321 10.3787 0.091 -0.085 ? 50.69 ? 3
<- GPSD,Y=MID4 1118327704.280 8:23 6 84 0 0:28 7 160 0 0:8 66 189 45 1:29 13 273 0 0:10 51 304 0 0:4 15 199 34 1:2 34 241 41 1:27 71 76 42 1:
<- GPSD,O=MID2 1118327704.280 0.005 46.498346855 7.567381517 1341.619 48.000 32.321 10.3787 0.091 -0.085 ? ? ? 3
-> W=0
<- GPSD,W=0
-------------------------------------------------------------------------

The fields in the O report are tag (an indication of the device
sentence that produced this report), time, time error estimate,
longitude, latitude, altitude, horizontal error estimate, vertical
error estimate, course, speed, climb/sink, error estimates for
those last three fields, and mode (an indication of fix quality).  If
you care about issues like reporting units, read the documentation.

The 'Y' report is a satellite skyview, giving right-ascension,
declination, and signal quality for each of the visible satellites.
GPSes usually report this every five cycles (seconds).

The 'W', 'O' and 'Y' sentences, together, effectively constituted
version 2 of the protocol - designed for streaming use. The other
single-shot commands, though still supported, rapidly became
obsolescent.

Attentive readers may wonder why I designed a novel 'O' format rather
that writing the watcher-mode command so that it could specify a
compound report format (like PADE) every second.  Part of the answer
is, again, that we were running out of letters to associate with new
data fields like the error estimates.  I wanted to use up as little of
the remaining namespace as I could get away with.

Another reason is, I think, that I was still half-consciously thinking
of bit bandwidth as a scarce resource to be conserved.  I had a bias
against designs that would associate "extra" name tags with the
response fields ("A=351.27") even though the longest tagged response
GPSD could be expected to generate would still be shorter than a
single Ethernet packet (1509 bytes).

== Pressure builds for a redesign ==

Along about 2006, despite my efforts to conserve the remaining
namespace, we ran out of letters completely. As the PADE example
shows, the protocol parser interprets command words letter
by letter, so trying to wedge longer commands in by simple
fiat wouldn't work. Recruiting non-letter characters as
command characters would have been ugly and only postponed
the problem a bit, not solved it.

'H' is actually still left, but at the time I believed we couldn't
commit the last letter (whatever it was) because we'd need it as an
inline switch to a new protocol.  I started feeling pressure to
actually design a new protocol.  Besides running out of command
namespace in the old one, a couple of things were happening that
implied we'd need to define new commands.

What had used up the last of the command namespace was multi-device
support.  Originally, GPSD could only monitor one GPS at a time. I
re-engineered it so it could monitor multiple GPSes, with GPS streams
available as data channels to which a client could connect one at a
time.  I was thinking about use cases like this one: spot two GPSes on
either end of an oil tanker, use the position delta as a check on
reported true course.

(For those of you wondering, this wasn't the huge job it may sound
like.  I had carefully structured GPSD as a relatively small (about
5.5 KLOC) networking and dispatcher top-level calling a 30 KLOC driver
and services library, all of which was designed from the get-go to use
re-entrant structures.  Thus, only the top layer needed to change, and
at that only about 1 KLOC of it actually did. Building the test
framework to verify the multi-device code in action was a bigger job.)

Note that the "one at a time" limitation was imposed by the
protocol design, notably the fact that the 'O' record didn't contain
the name of the device it was reporting from.  Thus, GPSD could not
mix reports from different devices without effectively discarding
information about where they had come from.

Though I had just barely managed to cram in multi-GPS support without
overrunning the available command space, we were starting to look at
monitoring multiple *kinds* of devices in one session - RTCM2
correction sources and NTRIP were the first examples. (These are both
protocols that support
http://www.esri.com/news/arcuser/0103/differential1of2.html[differential
GPS correction].) My chief lieutenant was muttering about making GPSD
report raw pseudorange data from the sensors that allow you to get at
that.  It was abundantly clear that broadening GPSD's scope was going
to require command-set extensions.

Even though I love designing application protocols only a little bit
less than I love designing domain-specific minilanguages, I dragged my
feet on tackling the GPSD-NG redesign for three years. I had a strong
feeling that I didn't understand the problem space well enough, and
that jumping into the effort prematurely might lock in some mistakes
that I would come to gravely regret later on.

== JSON and the AISonauts ==

What finally got me off the dime in early 2009 were two developments - the
push of AIS and the pull of JSON.

AIS is the marine http://www.navcen.uscg.gov/enav/ais/[Automatic
Identification System]. All the open-source implementations of AIS
packet decoding I could find were sketchy, incomplete, and not at a
quality level I was comfortable with.  It quickly became apparent that
this was due to a paucity of freely available public information about
the applicable standards.

http://esr.ibiblio.org/?p=888[I fixed that problem] - but having done
so, I was faced with the problem of just how GPSD is supposed to
report AIS data packets to clients in a way that can't be confused
with GPS data.  This brought the GPSD-NG design problem to the front
burner again.

Fortunately, my AIS-related research also led me to discover
http://www.json.org/[JSON], aka JavaScript Object Notation.  And JSON
is *really nifty*, one of those ideas that seem so simple and
powerful and obvious once you've seen it that you wonder why it wasn't
invented sooner.

In brief, JSON is a lightweight and human-readable way to serialize
data structures equivalent to Python dictionaries, with attributes
that can be numbers, strings, booleans, nested dictionary objects,
or variable-extent lists of any of these things.

== GPSD-NG is born ==

I had played with several different protocol design possibilities
between 2006 and 2009, but none of them really felt right. My
breakthrough moment in the GPSD-NG design came when I thought this:
"Suppose all command arguments to GPSD-NG commands, and their
responses, were self-describing JSON objects?"

In particular, the equivalent of the 'O' report shown above looks like
this in GPSD-NG (with some whitespace added to avoid hard-to-read
linewraps):

-------------------------------------------------------------------------
{"class":"TPV","tag":"MID50","device":"/dev/pts/1",
   "time":"2005-06-09T14:35:11.79",
   "ept":0.005,"lat":46.498333338,"lon":7.567392712,"alt":1341.667,
   "eph":48.000,"epv":32.321,"track":60.9597,"speed":0.161,"climb":-0.074,
   "eps":50.73,"mode":3}
-------------------------------------------------------------------------

To really appreciate what you can do with object-valued attributes,
however, consider this JSON equivalent of a 'Y' record.  The skyview
is a sublist of objects, one per satellite in view:

-------------------------------------------------------------------------
{"class":"SKY","tag":"MID2","device":"/dev/pts/1",
   "time":"2005-06-09T14:35:11.79",
   "reported":8,"satellites":[
   {"PRN":23,"el":6,"az":84,"ss":0,"used":false},
   {"PRN":28,"el":7,"az":160,"ss":0,"used":false},
   {"PRN":8,"el":66,"az":189,"ss":40,"used":true},
   {"PRN":29,"el":13,"az":273,"ss":0,"used":false},
   {"PRN":10,"el":51,"az":304,"ss":36,"used":true},
   {"PRN":4,"el":15,"az":199,"ss":27,"used":false},
   {"PRN":2,"el":34,"az":241,"ss":36,"used":true},
   {"PRN":27,"el":71,"az":76,"ss":43,"used":true}
   ]}
-------------------------------------------------------------------------

(Yes, those "el" and "az" attributes are elevation and azimuth.  "PRN"
is the satellite ID; "ss" is signal strength in decibels, and "used"
is a flag indicating whether the satellite was used in the current solution."

These are rather more verbose than the 'O' or 'Y' records, but have several
compensating advantages:

* Easily extensible.  If we need to add more fields, we just add named
  attributes.  This is especially nice because...
* Fields with undefined values can be omitted.  This means extension
  fields don't weigh down the response format when we aren't using them.
* It's explicit.  Much easier to read with eyeball than the corresponding
  'O' record.
* It includes the name of the device reporting the fix. This opens up
  some design possibilities I will discuss in more detail in a bit.
* It includes, up front, a "class" tag that tells client software what it
  is, which can be used to drive a parse.

My first key decision was that these benefits are a good trade for the
increased verbosity.  I had to wrestle with this a bit; I've been
programming a long time, and (as I mentioned previously) have reflexes
from elder days that push me to equate "good" with "requiring minimum
computing power and bandwidth".  I reminded myself that it's 2009 and
machine resources are cheap; readability and extensibility are the goals
to play for.

Once I had decided that, though, there remained another potential
blocker.  The implementation language of gpsd and its principal client
library is C.  There are lots of open-source JSON parsers in C out
there, but they all have the defect of requiring malloc(3) and handing
back a dynamic data structure that you then have to pointer-walk at
runtime.

This is a problem, because one of my design rules for gpsd is no use
of malloc. Memory leaks in long-running service daemons are bad things;
using only static, fixed-extent data structures is a brutally effective
strategy for avoiding them. Note, this is only possible because the maximum
size of the packets gpsd sees is fairly small, and its algorithms are O(1)
in memory utilization.

"Um, wait..." I hear you asking "...why accept that constraint when
gpsd hasn't had a requirement to parse JSON yet, just emit it as
responses?"  Because I fully expected gpsd to have to parse structured
JSON arguments for commands.  Here's an example, which I'll explain fully
later but right now just hint at the (approximate) GPSD-NG equivalent
of a 'W+R+' command.

-------------------------------------------------------------------------
?WATCH={"raw":1,nmea:true}
-------------------------------------------------------------------------

Even had I not anticipated parsing JSON arguments in gpsd, I try to
limit malloc use in the client libraries as well.  Before the
new-protocol implementation the client library only used two calloc(3)
calls, in very careful ways. Now they use none at all.

So my next challenge was to write and verify a tiny JSON parser that
is driven by sets of fixed-extent structures - they tell it what shape
of data to expect and at which static locations to drop the actual
parsed data; if the shape does not match what's expected, error out.
Fortunately, I am quite good at this sort of hacking - the
result, after a day and a half of work, fit in 310 LOC including
comments (but not including 165 LOC of unit-test code).

== Un-channeling: the power ==

Both gpsd and its C client library could now count on parsing JSON;
that gave me my infrastructure.  And an extremely strong one, too;
the type ontology of JSON is rich enough that I'm not likely to ever
have to replace it.  Of course this just opened up the next question -
now that I can readily pass complex objects between gpsd and its
client libraries, what do I actually do with this capability?

The possibility that immediately suggested itself was "get rid of channels".
In the old interface, subscribers could only listen to one device at
a time - again, this was a consequence of the fact that 'O' and 'Y'
reports were designed before multi-device support and didn't include a
device field.  JSON reports can easily include a device field and
thus need not have this problem.

Instead of a channel-oriented interface, then, how about one where the
client chooses what classes of message to listen to, and then gets
them from all devices?

Note, however, that including the device field raises some problems of
its own. I do most of my gpsd testing with a utility I wrote called
gpsfake, which feeds one or more specified data logs through pty
devices so gpsd sees them as serial devices.  Because X also uses pty
devices for virtual terminals, the device names that a gpsd instance
running under gpsfake sees may depend on random factors like the
number of terminal emulators I have open.  This is a problem when
regression-testing!  I thought this issue was going to require me
to write a configuration command that suppresses device display; I
ended up writing a sed filter in my regression-test driver instead.

Now we come back to our previous example:

-------------------------------------------------------------------------
?WATCH={"raw":true,nmea:true}
-------------------------------------------------------------------------

This says: "Stream all reports from all devices at me, setting raw
mode and dumping as pseudo-NMEA if it's a binary protocol." The way to
add more controls to this is obvious, which is sort of the point --
nothing like this could have fit in the fixed-length syntax of the old
pre-JSON protocol.

This is not mere theory. At the time of writing, the ?WATCH command is
fully implemented in gpsd's Subversion repository, and I expect it to
ship ready for use in our next release (2.90).  Total time to build
and test the JSON parsing infrastructure, the GPSD-NG parser, and the
gpsd internals enhancements needed to support multi-device listening?
About a working week.

Just to round out this section, here is an example of what an
actual AIS transponder report looks like in JSON.

-------------------------------------------------------------------------
{"class"="AIS","msgtype":5,"repeat":0,"mmsi":"351759000","imo":9134270,
   "ais_version":0,"callsign":"3FOF8","shipname":"EVER DIADEM",
   "shiptype":70,"to_bow":225,"to_stern":70,"to_port":1,"to_starboard":31,
   "epfd":1,"eta":05-15T14:00Z,"draught":122,"destination":"NEW YORK",
   "dte":0}
-------------------------------------------------------------------------

The above is an AIS type 5 message identifying a ship - giving, among
other things, the ship's name and radio callsign and and destination
and ETA.  You might get this from an AIS transceiver, if you had one
hooked up to your host machine; gpsd would recognize those data
packets coming in and automatically make AIS reports available as
an event stream.

== The lessons of history ==

In the introduction, I called out three trends apparent over time in
protocol design.  Let's now consider these in more detail.

=== Bandwidth economy versus extensibility and explicitness ===

First, I noted *changing tradeoffs of bandwidth economy versus
extensibility and explicitness*.

One way you can compare protocols is by the amount of overhead they
incur.  In a binary format this is the percentage of the bit stream
that goes to magic numbers, framing bits, padding, checksums, and
the like.  In a textual format the equivalent is the percentage
of the bitstream devoted to field delimiters, sentence start and
sentence-end sentinels, and (in protocols like NMEA 0183) textual
checksum fields.

Another way you can compare protocols is by implicitness versus
explicitness.  In the old GPSD protocol, you know the semantics of a
request parameter within a request implicitly, by where it is in
the order. In GPSD-NG, you know more explicitly because every parameter is a
name-attribute pair and you can inspect the name.

Extensibility is the degree to which the protocol can have new
requests, responses, and parameters added without breaking old
implementations.

In general, *both extensibility and overhead rise with the degree
of explicitness in the protocol*.  The JSON-based TPV record has
has much higher overhead than the O record it replaces, but what
we gain from that is lots and *lots* of extensibility room. We
win three different ways:

* The command/response namespace in inexhaustibly huge.
* Individual requests and responses can readily be extended by adding
  new attributes without breaking old implementations.
* The type ontology of JSON is rich enough to make passing arbitrarily
  complex data structures through it very easy.

With respect to the tradeoffs between explicitness/extensibility and
overhead, we're at a very different place on the cost-benefit curves
today from when the original GPSD protocol was designed.

Communications costs for the pipes that GPSD uses have
dropped by orders of magnitude in the decade-and-change since GPSD
was designed. Thus, squeezing every last bit of overhead out of the
protocol representation doesn't have the real economic payoff it used to.

Under modern conditions, there is a strong case that implicit,
tightly-packed protocols are false economy. If (as with the first GPSD
protocol) they're so inextensible that natural growth in the
software breaks them, that's a clear down-check.  It's better to design
for extensibility up front in order to avoid having to throw out
a lot of work later on.

The direction this points in for the future is clear, especially
in combination with the increasing use of metaprotocols.

=== From lockstep to streaming ===

Second, I noted *a shift from lockstep conversational interfaces to
event streams*.

The big change in the second protocol version was watcher mode.  One
of the possibilities this opens up is that you can put the report
interpreter into an asychronous thread that magically updates a C
struct for you every so often, without the rest of your program having
to know or care how that is being done (except possibly by waiting a
mutex to ensure it doesn't read a partially-updated state).

Analogous developments have been visible in other Internet protocols
over roughly the same period.  Compare, for example, POP3 to IMAP. The
former is a lockstep protocol, the latter designed for streaming - it's
why IMAP responses have a transaction ID tying them back to the
requesting command, so responses that are out of order due to
processing delays can be handled sanely.

Systems software has generally been moving in a similar direction,
propelled there by distributed processing and networks with unavoidable
variable delays.  There is a distant, but perceptible, relationship
between GPSD-NG's request-response objects and the way transactions
are handled within (for example) the X window system.

This trend, too, seems certain to continue, as the Internet becomes
ever more like one giant distributed computing system.

=== Type ontology recapitulates trends in language design ===

Third, *changes in the "sweet spot" of protocol designs
due to increasing use of scripting languages.*

The most exciting thing about JSON to me, speaking as an application
protocol designer, is the rich type ontology - booleans, numbers,
strings, lists, and dictionaries - and the ability to nest them to any
level. In an important sense that is orthogonal to raw bandwidth,
this makes the pipe wider - it means complex, structured data can more
readily be passed through with a minimum of fragile and bug-prone
serialization/deserialization code.

The fact that I could build a JSON parser to unpack to fixed-extent C
structures in 300-odd LOC demonstrates that this effect is a powerful
code simplifier even when the host language's type ontology is limited
to fixed-extent types and poorly matched to that of JSON (C lacks not
only variable-extent lists but also dictionaries).

JSON is built on dictionaries; in fact, every JSON object is a legal
structure literal in the dictionary-centric Python language (with one
qualified exception near the JSON null value). It seems like a simple
idea in 2009, but the apparent simplicity relies on folk knowledge we
didn't have before Perl introduced dictionaries as a first-class data
type (c.1986) and Python built an object system around them (after
1991).

Thus, GPSD-NG (and the JSON it's built on) reflects and recapitulates
long-term trends in language design, especially those associated with
the rise of scripting languages and of dictionaries as a  first-class
type within them.

This produces several mutually reinforcing feedback loops.  The
rise of scripting languages makes it easier to use JSON to its full
potential, if only because deserialization is so trivial.  JSON will
probably, in turn, promote the use of these languages.

I think, in the future, application protocol designers will become
progressively less reluctant to rely on being able to pass around
complex data structures.  JSON distils the standard type ontology of
modern scripting languages (Perl, Python, Ruby, and progeny) into a
common data language that is far more expressive than the structs of
yesteryear.

== Protocols on top of metaprotocols ==

GPSD-NG is an application of JSON.  Not a completely pure one; the
request identifiers, are, for convenience reasons, outside the JSON
objects. But close enough.

In recent years, metaprotocols have become an important weapon in
the application-protocol designer's toolkit.  XML, and its
progeny SOAP and XML-RPC, are the best known metaprotocols. YAML
(of which JSON is essentially a subset) has a following as well.

Designing on top of a metaprotocol has several advantages.  The most
obvious one is the presence of lots of open-source software to use for
parsing the metaprotocol.

But it is probably more important in the long run that it saves one
from having to reinvent a lot of wheels and ad-hoc representations
at the design level.  This effect is muted in XML, which has a weak
type ontology, but much more pronounced in YAML or JSON.  As a
relevant example, I didn't have to think three seconds about the right
representation even for the relatively complex SKY object.

== Paths not taken ==

Following the first public release of this paper, the major questions
to come up from early readers were "Why not XML?" and "Why not a
super-efficient packed binary protocol?"

I would have thought the case against packed binary application
protocols was obvious from my preceding arguments, but I'll make it
explicit here: generally, they are even more rigid and inextensible
than a textual protocol relying on parameter ordering, and hence more
likely to break as your application evolves. They have significant
portability issues around things like byte order in numeric fields.
They are opaque; they cannot be audited or analyzed without bug-prone
special-purpose tools, adding a forbidding degree of complexity and
friction to the life-cycle maintenance costs.

When the type ontology of your application includes only objects like
strings or numbers that (as opposed to large binary blobs like images)
have textual representations differing little in size from packed
binary, there is no case at all for incurring these large overheads.

The case against XML is not as strong. An XML-based protocol at least
need not be rigidly inextensible and opaque. XML's problem is that,
while it's a good basis for document interchange, it doesn't naturally
express the sorts of data structures cooperating applications want to
pass around.

While such things can be layered over XML with an appropriate schema,
the apparatus required for schema-aware parsing is necessarily
complicated and heavyweight - certainly orders of magnitude more so
than the little JSON parser I wrote. And XML itself is pretty
heavyweight, too - one's data tends to stagger under the bulk
of the markup parts.

== Envoi ==

Finally, a note of thanks to the JSON developers...

I think JSON does a better job of nailing the optimum in metaprotocols
than anything I've seen before - its combination of simplicity and
expressiveness certainly isn't matched by XML, for reasons already
called out in my discussion of paths not taken.

I have found JSON pleasant to work with, liberating, and
thought-provoking; hence this paper. I will certainly reach for this
Swiss-army knife first thing, next time I have to design an
application protocol.
